Beyond No Significant Difference : Differentiating Learning Outcomes Using Multidimensional Content Analysis
نویسنده
چکیده
Accurately assessing student learning outcomes presents a complex challenge to educators. Valid comparative assessments of different groups of students are especially difficult. Different techniques exist, one of the most common being statistical hypothesis testing of examination scores or other graded instruments. The common conclusion of “no significant difference” based on such tests suggests little useful and prescriptive information, especially when evaluating online learning experiences. As more institutions move to this knowledge delivery platform, understanding whether distance students learn as well as (or better than) traditional students becomes critical. This case study demonstrates application of a qualitative assessment technique, content analysis, and how its inclusion in the assessment process leads to more useful results. Specifically, participation of third-party reviewers, attention to appropriate psychometric measures, and statistical tests of transformed qualitative data lead to pedagogical insight, accurate assessment of traditional versus online learning, and opportunities for curriculum improvement. Introduction The evolving acceptance of distance education, particularly that delivered online via the World Wide Web, by higher education institutions has been bolstered by researchers’ findings of no statistical significance in numerical results between students in online and traditional classrooms. However, the reliance on final grades and test scores, even though derived from instruments designed to express the depth of learning, presents potentially severe limitations in the process of assessing student learning outcomes. For example, averaging may mask students’ achievement of specific learning objectives, making it difficult to create complete profiles of student learning. The plethora of “no significant difference” results, therefore, does not help improve pedagogy or convince critics that the online platform may, in fact, enable students to achieve or surpass the learning experienced in the traditional classroom. It is essential to include different, but not necessarily new, analytical approaches that can contribute to the ongoing assessment of online learning. The authors reviewed several techniques that may be applicable, and decided to apply multidimensional content analysis. We propose that applying this technique to student-generated research papers, rather than applying statistical tests to results from instructor-generated instruments, provides a different perspective on the online learning issues and creates a more robust view of student learning. Literature Review Distance learning courses and programs proliferate at institutions of higher education. The Council for Higher Education Accreditation (2002) estimated collegiate enrollment in distance education numbered 2.2 million students in 2002. Additional projections suggested approximately 85% of all higher education institutions offered some form of distance education. Colleges and universities market their courses and programs aggressively, leaving student consumers at risk concerning the cost, quality, and utility of their education. Regional accrediting bodies recognize the legitimacy of distance education as a viable alternative to traditional oncampus program and curriculum delivery (CHEA 2002). Results from a recent survey (Allen and Seaman 2003) sent to chief academic officers at 3,033 degree-granting institutions confirmed this expected growth, specific to the online environment. Based on the 32.8% response rate, • 81% of all institutions offer at least one fully online or blended (30-80% of content delivered via the Internet) online course. • 34% of all institutions offer complete online degree programs. • among public institutions, 97% offer at least one online or blended course, and 49% offer an online degree program. • 67% of respondents indicated online education is a critical long-term strategy. While faculty seem a bit less progressive than administrators in their embrace of online education (40% of respondents indicated their faculty do not fully accept the value and legitimacy of online education), the majority of the latter (57%) believe the learning outcomes for online courses and programs are equal or superior to face-to-face instruction. Best practices for student learning, regardless of delivery medium, must be identified and instructor behavior modified accordingly to improve the quality of education. The keys to success include faculty recognition of the need for different approaches to pedagogy, learning, and assessment in the online environment. Distance education is simply defined as an academic environment that separates the instructor and learner during the majority of an instructional process; uses educational media to unite teacher and learner, and carry course content; and facilitates two-way communication between instructor and learner (Steiner 1995). This definition allows a multitude of delivery media and methodologies, ranging from written correspondence and videoconferencing to virtual classrooms. It also suggests the difficulty in differentiating relevant research results frequently grouped under this umbrella. For the purpose of this review, distance education is limited to courses delivered online via the World Wide Web, taking advantage of asynchronous learning platforms such as Blackboard or WebCT that include a synchronous activity -real-time chats. Questions about course quality and student learning in distance education programs reinforce the need for rigorous and valid approaches to outcomes assessment. Accrediting bodies seek to address these issues (AACSB 1999; CHEA 2002; Middle States Commission on Higher Education 2002), and benchmarking activities to identify existing best practices (albeit frequently anecdotal) have commenced (Phipps et al. 1998; IHEP 2000). The simple approach of performing statistical tests of examination scores from traditional and online courses is insufficient to support prescriptive conclusions about the comparative quality of distance education. Notwithstanding the limited insights derived from the “no significant difference” conclusion of such statistical tests, empirical results proliferate (Russell 2003), as if, by sheer number, they will establish equivalency of learning outcomes. The lack of an empirically validated set of best practices for distance education makes this comparative assessment process even more difficult and inconclusive. The research results listed in Russell’s (2003) “nosignificantdifference” and “significantdifference” websites are testament to the preponderance of research based on comparing final grades, test scores, grades on papers, and student evaluations. These results are well documented in the literature; therefore, a concise review of several results suffices. Dellana, Collins, and West (2000) concluded that online and traditional methods are equally effective because no significant difference was found for final student scores in an undergraduate management science course. Gagne and Shepherd (2001) found similar results in a graduate level accounting course. Contrarily, Schutte (1998) discovered that online students scored significantly better than traditional students on exams in a social statistics course, while Brown and Liedholm (2002) found traditional students performed significantly better on exams than online students in a microeconomics course. Clearly, this type of research, while helpful, will never definitively conclude whether student learning is comparable for online and traditional students. Instead, extensions of this evaluative process must be made to advance the research. Methodology Application of multiple assessment techniques is necessary to derive reliable results specific to individual students and course delivery environments; i.e., traditional versus online. The principal author’s pedagogy incorporates this perspective, regardless of environment. The intent is to enable students to succeed and allow ample opportunity for them to demonstrate their achievement of a course’s behavioral objectives. Measuring how well students achieve the behavioral objectives for an MBA statistics course requires frequent and varied assessments of direct applications of statistical methodology and techniques to practical business problems, issues, and opportunities (Love and Hildebrand 1992; Parker et al. 1999; Snee 1993). One of the difficulties associated with conducting valid and reliable empirical research is the risk inherent for the student participants. For them, a course is not an experiment; in the case presented here it is their one semester of business statistics in their MBA program. To minimize any associated risks, delivery was held as constant as possible. In this case analysis, the samples were drawn from the same population; the instructor, textbook, and supplementary materials remained unchanged; and the assignments, examinations, and technology varied minimally (similar to changes in a traditional course from semester to semester). Table 1 presents results for three graded assessments from a class of traditional students and a class of online students. Small sample t-tests suggest there is no significant difference in the mean test scores for the midterm exam, but that there are significant differences for the final exam and the research paper (p < 0.05) in favor of the traditional course delivery. How might we reconcile these conflicting conclusions? What other techniques may be appropriate to form a more reliable conclusion about the depth of learning? What other techniques enrich a straightforward statistical test of scores assigned by an instructor, regardless of the assessment type? One potential technique would employ using a multidimensional approach on exams by creating test questions that target different dimensions. A factor analysis could verify that the questions load on the dimensions, with a comparison of summary statistics ensuing. However, this is still an initially quantitative assessment with students responding to instructor-prepared questions. Qualitative techniques, in particular content analysis, may be more effective. Table 1: Statistical Test of H0: No Difference in Mean Scores (for three sets of measurements) Midterm Exam Final Exam Research Paper Online Traditional Online Traditional Online Traditional Mean 76.08 71.73 55.33 64.82 84.92 95.27 Standard Deviation 8.64 9.42 9.07 9.33 14.43 4.17 t Statistic 1.16 -2.47 -2.38 P(T<=t) two-tail 0.2604 0.0221 0.0347 Qualitative analysis can provide an alternative view of student learning than that which is captured in purely quantitative analysis, and may be more relevant to several disciplines where qualitative research is typically applied. This could enhance credibility for the online environment in those fields. Conducted by the instructor, however, applying this methodology simply constitutes a redundant review, probably consisting of the same criteria used for the original grading of a given assessment instrument. What we suggest here are multiple independent, third-party reviews followed by appropriate quantitative analysis of the results. Content analysis distinguishes emergent themes evident in written representations from sample participants that may reveal more information than summary quantitative techniques (Weber, 1990). Already applied in varying degrees in relevant empirical analyses available in the literature (Johnson 2002; Johnson et al. 2000; Maki et al. 2000), this technique is useful in exploratory research, theory development, hypothesis testing, and applied research, and may be used for either descriptive or inferential conclusions. Smith (2000) suggested a detailed step-bystep approach in applying content analysis techniques. Our research process (and presentation of this case analysis) follows an abbreviated sequence, appropriate for our specific research objective. Content analysis, in fact any analysis technique, qualitative or quantitative, used in isolation remains insufficient for useful prescriptive conclusions in the field of assessing learning outcomes. Limited sample sizes, potential for bias, and numerous uncontrolled environmental variables are just a few of the complicating factors inherent in assessing learning outcomes in general, but prevalent in comparative research. Figure 2 summarizes a typical comparative assessment process. We propose expansion and extension of this process via application of the research process already presented in figure 1. Figure 1. Proposed Research Process State the research problem and research goal. State hypothesis being tested. Decide what type of qualitative material to use. Decide on coding system, i.e., rating dimensions. Code material and determine inter-coder agreement. Apply statistical tests and interpret results. Step 1. Research Problem and Goal How do we measure whether there is a difference in the quality and extent of learning for students in traditional courses versus those in corresponding web-based online courses? We seek to contribute to the methodologies applied in answering this question by demonstrating the application of content analysis in the case of a graduate-level business statistics course. The samples involved are similar in size, drawn from the same source population, participating in the same MBA curriculum, with control for the instructor, the course materials, and the course content. Step 2. Hypothesis Ho: learning MBA-level business statistics is not contingent on the delivery environment. More practically stated: there is no difference in learning MBA-level business statistics between students in the traditional environment and students in the online environment. Step 3. Qualitative Assessment Assessment techniques for the course being investigated included formal examinations (administered via an online platform in both scenarios), spreadsheet problem solutions, Figure 2. Comparative Online-Traditional Assessment Process Deliver Courses Measure Outcomes Conduct Comparative Analyses State Conclusions Examinations Research Paper Content discussion forums, and research papers. This report focuses on the research papers, with the expectation that they should demonstrate students’ abilities to articulate key statistical concepts, thereby demonstrating their grasp of the course content. The topic for the research paper was: "Using Statistics to Misrepresent the Truth." Students received the following instructions on the very first day of class in both course environments: Students must read and/or view media, government, and/or industry reports on any subject (e.g., marketing ads, political issues, health care, safety and security, etc.) for several weeks to identify misrepresentation (underor overstating) by the selective (intentional or unintentional) use of statistics in a subject area of interest. At any point in this review, students must create a one-page proposal specifying the subject area they have identified for further investigation. The final paper must number 5 to 7 pages, excluding title and reference pages. It must contain at least five distinct reference citations from legitimate academic, government, industry, or media sources using APA Publication Manual format. Students also received the following grading criteria and point allocations: Format/Grammar/References: 20 Presentation of the Subject Issue: 20 Evidence of Misrepresentation: 20 Interpretation and Explanation: 20 Overall Effect: 20 Table 2 lists the course content students experienced prior to the research paper’s due date. Step 4. Coding System The content analysis coding system specifies the information that must be evident in the qualitative assessment. It seeks to provide a reasonable basis for objectivity by the reviewers (coders) by making distinctions explicit. Smith (2002) suggested the coding system include: 1. Definitions of the units of material. 2. Categories or dimensions of classification. 3. Rules for applying the system. The coding unit represents a specific dimension, or statistical topic or concept, which demonstrates what students have learned by their ability to recognize it in their research. Each coding unit (in this case we began with eight), was then differentiated by degree and assigned scale definitions that remained consistent across all units. These scale definitions represent intensity or degree, similar to Likert interval scales (Oppenheim 1992) common in behavioral research. Table 3 depicts this information for the first dimension, Type of Data. The entire Table 2: Statistical Analysis & Design Course Content Experienced Prior to Research Paper Due Date Introduction Descriptive Statistics Probability Discrete Random Variables Continuous Random variables Sampling Distributions Populations and samples Shape of a distribution Sample spaces and events Discrete probability distributions Continuous probability distributions Sampling distribution of the sample mean Sampling a population Measures of central tendency Elementary rules Binomial distribution Uniform distribution Sampling a process Measures of variation Conditional probability Qualitative data Scales of measurement Scatterplots Independence of events Poisson distribution Normal distribution Sampling distribution of the sample proportion scheme covering all eight dimensions is summarized in Appendix A. Note that this scheme was developed before the coders viewed the research manuscripts. Coding scales must clearly differentiate levels so that different coders can agree on which level a sample element, in this case a student’s research paper, most closely fits. The range of the scale can vary substantially, from as few as two levels in the simple dichotomous case, to as many as six in a well-defined intensity scale (Smith 2000). Frequently, the issue being studied and the language possibilities themselves limit how broad or narrow such scales may be (Weber 1990). The key is careful articulation of the entire coding system to facilitate common understanding that results in consistent application of it across all sample elements. Step 5. Code Material & Assess Intercoder Agreement Having the instructor participate in the actual coding would serve no purpose. This individual already reviewed this particular assignment for both samples as part of his routine course activities. In effect, such a review would be redundant and self-serving; it would at best Table 3: Coding Scheme for Type of Data Dimension Rating Scale 1 2 3 Dimension: Type of Data Definition of Level No evidence that student recognized type(s) of data from which statistics, graphs, or other information were derived. Some evidence that student recognized type(s) of data from which statistics, graphs, or other information were derived. Some evidence that student recognized type(s) of data from which statistics, graphs, or other information were derived and considered type-specific limitations on analyses that could be performed or interpretations or conclusions that could be drawn. constitute validation of implicit links to the original grading criteria published to students, and at worst suggest inconsistent or invalid links to the course’s behavioral objectives. Instead, two coders, neither of whom had participated in this course, this program, or this university, but having requisite knowledge and skills in both content and pedagogy, accepted this role. Each received the original research paper files, the coding scheme presented in Appendix A, and an identical checksheet for recording their assessments independently. Discussion between the coders was limited to review of the coding scheme, in particular the scale definitions, to ensure common understanding of the process and the expectations. Prior to quantitative analysis of the coders’ ratings, reliability checks should be performed (John and Benet-Martinez 2000). At this point in the process, we sought to establish the reliability of the coder as a measuring instrument, referred to as intercoder agreement (Smith 2002). As Smith indicated, such agreement constitutes evidence of objectivity of the coding system. It is a necessary, but not sufficient, condition for validity of the scoring. The actual measure of this agreement varies, depending on the type of rating levels created – either continuous or categorical. In our analysis, we chose the former based on the interval scale created for this review. Like a progressively inclusive satisfaction scale, each level contains the previous, eliminating the independent/mutual exclusivity requirements of categorical scales. Tables 4 and 5 summarize the results from the two coders. Differences in the distribution of the ratings are evident in several dimensions, both within and between the two samples. Table 6, depicting the average scores by dimension and coder, confirms these differences, notably, dimensions 5 through 8 in Sample A, and all but dimension 1 in Sample B. Since both raters scored the Type of Data dimension identically, with all students’ papers rating a 1 (no evidence student recognized), this dimension was eliminated from further consideration. Table 5: Coder #2 Results Frequencies: Rater #2 Dimension: 1 2 3 4 5 6 7 8 Sample Rating 1 12 0 6 7 9 11 7 7 2 0 10 4 4 3 0 1 2 A 3 0 2 2 1 0 1 4 3 1 11 3 1 6 5 5 6 6 2 0 7 7 4 6 5 4 4 B
منابع مشابه
Psychometrics of E-learning Acceptance and Learning Outcomes Questionnaire in University Students
Background and purpose: Electronic learning (e-learning) is highly important nowadays, so, it is necessary to have a suitable tool to check its level of acceptance and outcomes. The present study aimed to evaluate e-learning acceptance and learning outcomes in university students. Materials and methods: A methodological cross-sectional study was performed in 410 undergraduate students at Mazan...
متن کاملA Comparative Study of Learning and Motivation in Continuing Medical Education Based on Integrated Instructional and Motivational Design Models
Introduction: There are few studies that compare electronic learning in continuing medical education using instructional material developed based on scientific principles of instructional and motivational designs. Therefore, this study was performed in Kermanshah University of Medical Science in 2011 in order to compare physicians’ learning and motivation in these two instructional approaches. ...
متن کاملThe Effectiveness of Online Learning: Beyond No Significant Difference and Future Horizons
The physical “brick and mortar” classroom is starting to lose its monopoly as the place of learning. The Internet has made online learning possible, and many researchers and educators are interested in online learning to enhance and improve student learning outcomes while combating the reduction in resources, particularly in higher education. It is imperative that researchers and educators cons...
متن کاملEffects of problem based learning approach on medical students’ learning, satisfaction and engagement in embryology course
Background: Problem-based learning is a student-centered teaching method that encourages students to become active learners in the classroom and to improve the learning processes. The aim of this study was to compare two methods of teaching, problem- based learning (PBL) and lecture-based learning, in an embryology course. Methods: This was a semi-experimental study conducted in Kurdistan Uni...
متن کاملThe Impact of Cooperative Learning and Mobile Learning through Bluetooth Device on Vocabulary Learning of Iranian EFL Learners
Cooperative learning has been found to affect different aspects of language learning by many researchers (e.g., Kagan, 1995; Kagan, 1999; Kessler, 1992; McGroarty, 1993). Likewise, mobile assisted language learning (MALL) has revealed significant impacts on the improvement of different language skills and components (e.g., Comas-Quinn et al. 2009; Divitini & Chabert, 2009; Motallebzadeh & Ganja...
متن کاملبررسی تأثیر نقشه مفهومی بر رویکردهای یادگیری دانشجویان پرستاری
Background & Aim: Most of the recent studies in nursing education have been focused on outcomes of learning. Conventional clinical nursing education, which emphasizes on skill acquisitions, may not encourage students to adopt meaningful or deep learning approaches. This study examined the effect of Concept Mapping on learning approaches among nursing students. Methods & Materials: In this qu...
متن کامل